AP® STATISTICS 
2006 SCORING GUIDELINES (Form B) 


Question 4 


Intent of Question 


The primary goal of this question is to assess a student’s ability to conduct a test of significance by stating the 
hypotheses of interest, checking the necessary conditions, calculating the test statistic and p-value, and making a 
conclusion in context. 


Solution 
Step 1: States a correct pair of hypotheses. 


Let 4p denote the mean difference (after — before) in dexterity scores for the population of individuals 
enrolled in the program. 
Ho : Up = Oversus H, : Up > 0 


Step 2: Identifies a correct test (by name or formula) and checks appropriate conditions. 


x 
One sample f-test or paired f-test or t = —?—. 
Pp Pp Pe / i 
We are told that the 12 people are a random sample. Assume that the differences (after — before) are 
approximately normal. This check may be done with a histogram, dotplot, stem-and-leaf display, or normal 


probability plot. The student should note that the normal assumption is not unreasonable because the plot 
displays no obvious skewness or outliers. 


Step 3: Correct mechanics, including the value of the test statistic and the p-value (or rejection region). 


Xp = 0.375,5p = 0.367 

Degrees of freedom = 12-—1=11 
0.375 

i=, = 3.54 
0.367 
V12 


p-value = 0.002 








Step 4: States a correct conclusion in the context of the problem. 
Since the p-value is less than 0.05, we can reject the null hypothesis of no difference in favor of the 


alternative and conclude that, on average, people who completed the program have significantly increased 
manual dexterity. 


Scoring 


Each of the four steps is scored as essentially correct (E) or incorrect (I). 
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AP® STATISTICS 
2006 SCORING GUIDELINES (Form B) 


Question 4 (continued) 


Notes for Step 2: 
Although it is not recommended, a one sample confidence interval for 4p could be used to test the 
hypotheses in Step 1. An appropriate adjustment to the confidence level must be made since we are 
conducting a one-sided test. The correct formula is 
(az Sp 0.367 
D n-l,a Jn ies ? 
The null hypothesis of no change in mean dexterity scores is rejected at the 0.05 level of significance 
because the right end of this 95 percent one sided confidence interval is above zero. If the ¢-value used to 
constructing the confidence interval does not match the significance level given in the conclusion, then 
the maximum score for Step 4 is partially correct (P). 
If an incorrect two sample procedure is used, then Step 2 is scored as incorrect. The maximum score for a 
two sample ¢ procedure is 3. 


,00) => (0.375 — 1.796 X 





co} => (0.1847, 0). 





Incorrect Solutions for Step 2 








Procedure df Test Statistic p-value 
Two sample ¢-test 21.98 t = 1.05 0.153 
Pooled t-test 22 t = 1.05 0.152 




















A response using separate confidence intervals for the two means is also scored as incorrect for Step 2. 


Notes for Step 3: 


An identifiable minor arithmetic error is Step 3 will not necessarily change a score from essentially 
correct to incorrect. 


If the student argues that the normal distribution is not reasonable, then they may use hypothesis tests for 














the median. 
Other Solutions for Step 3 
Procedure Test Statistic p-value 
Sign Test B=8 0.0547 
Wilcoxon Signed Rank Test W = 52 0.007 














If the p-value is incorrect but the conclusion in Step 4 is consistent with the computed p-value, Step 4 can 
be considered essentially correct. 


Notes for Step 4: 
If both an a and a p-value are given, the linkage in Step 4 is implied. If no a is given, the solution must be 
explicit about the linkage by giving a correct interpretation of the p-value or explaining how the 
conclusion follows from the p-value. 
If the hypotheses are reversed in Step 1 (i.e., Hy : Mp > Oversus H, : Up = 0), then the conclusion also 


needs to be reversed. Otherwise, both parts should be scored as incorrect (I). 


Question 4 (continued) 
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4. The developers of a training program designed to improve manual dexterity claim that people who complete the 
6-week program will increase their manual dexterity. A random sample of 12 people enrolled in the training 
- program was selected. A measure of each person’s dexterity on a scale from 1 (lowest) to 9 (highest) was 
recorded just before the start of and just after the completion of the 6-week program. The data are shown in the 
table below. 


a | es es ee ee Cee 










Can one conclude that the mean manual dexterity for people who have completed the 6-week training program 
has significantly increased? Support your conclusion with appropriate statistical evidence. 
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4. The developers of a training program cerer to improve manual dexterity claim that people whe complete the 
6-week program will increase their manual dexterity. A random sample of 12 people enrolled in the training 
' program was selected. A measure of each person’s dexterity on a scale from 1 (lowest) to 9 (highest) was 


recorded just before the start of and just after the completion of the 6-week program. The data are shown in the 
table below. 





Can one conclude that the mean manual dexterity for people who have coinpkeisd the 6-week training program 
has significantly increased? Support your conclusion with appropriate statistical evidence. 
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4. The developers of a training program designed to improve manual dexterity claim that people who complete the 
6-week program will increase their manual dexterity. A random sample of 12 people enrolled in the training 
program was selected. A measure of each person’s dexterity on a scale from 1 (lowest) to 9 (highest) was 
recorded just before the start of and just after the completion of the 6-week program. The data are shown in ‘the 


table below. 
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Can one conclude that the mean manual dexterity for people who have completed the 6-week training program 
has significantly increased? Support your conclusion with appropriate statistical evidence. 
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AP® STATISTICS 
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Question 4 


Sample: 4A 
Score: 4 


This essay recognizes the paired nature of the data and correctly specifies a one-sample t-test for the mean 
difference. The null hypothesis and appropriate one-sided alternative hypotheses are clearly stated with notation 
for the population means clearly defined. This enables one to know that the symbol for the mean difference refers 
to the mean manual dexterity before entering the program minus the mean manual dexterity after completing the 
program. It is important that the direction of this difference is well defined. This essay addresses the model 
assumptions underlying the use of the one-sample f-test by presenting a histogram of the observed differences to 
check for outliers and the shape of the distribution of differences and concludes that the data in this small sample 
present no strong reason to doubt the assumption of a normal distribution for differences. The essay also notes 
that a simple random sample of subjects was provided in the stem of the question. Simple random sampling could 
be used to help justify the assumption that the subjects respond independently of each other and are representative 
of the population from which they were selected. The value of the test statistic, degrees of freedom, and the p- 
values are correctly calculated. A correct conclusion is reached in the context of the problem and justified by 
comparing the p-value to a .01 significance level. 


Sample: 4B 
Score: 3 


A one-sample t-test for the mean difference is identified by statement and by formula. However, the null 
hypothesis and one-sided alternative hypotheses are in the wrong direction. A histogram of the observed 
differences is used to check for outliers and the shape of the distribution of differences. The standard deviation for 
the differences is not computed correctly, but the t-statistic is correctly evaluated from the incorrect standard 
deviation. The p-value is very large, but it is consistent with the stated null and alternative hypotheses, and the 
conclusion that is reached is also consistent with the p-value and reversed hypotheses. This essay recognizes the 
paired nature of the data and shows a good understanding of computing a t-statistic and reaching a conclusion, but 
there is some confusion in determining the appropriate null and alternative hypothesis from the context of the 
problem. 


Sample: 4C 
Score: 2 


This essay fails to recognize that before and after responses should be treated as paired data instead of independent 
samples. Appropriate null and alternative hypotheses are stated, but a t-test for two independent samples is specified. 
There are no checks of the assumptions of normality and independent samples. The two-sample t-test is correctly 
evaluated, but it yields a p-value that is larger than the p-value for the correct paired t-test because the strong positive 
correlation between the before and after measurements on these 12 subjects is ignored by the two-sample test. While 
the conclusion is consistent with the large p-value for the two-sample f-test, it appears to accept the null hypothesis 
instead of indicating that the two-sample f-test does not provide sufficient evidence to conclude that the training 
increased mean dexterity. 
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